智能论文笔记

Learning Graph Neural Networks for Image Style Transfer

Yongcheng Jing , Yining Mao , Yiding Yang , Yibing Zhan , Mingli Song , Xinchao Wang , Dacheng Tao

分类：计算机视觉

2022-07-24

最先进的参数和非参数样式转移方法容易导致由于全局统计的对准而导致的本地样式模式，或者由于补丁不匹配而导致的不愉快的人工制品。在本文中，我们研究了一种新型的半参数神经风格转移框架，可减轻参数和非参数风格的缺乏。我们方法的核心思想是使用图神经网络（GNN）建立准确且细粒的内容样式对应关系。为此，我们开发了一个详细的GNN模型，其中包含内容和样式的本地补丁作为图形顶点。然后，将样式转移过程建模为基于注意力的异质消息，以可学习的方式在样式和内容节点之间传递，从而导致本地补丁级别的自适应多一对一风格的相关性。此外，引入了详细的可变形图卷积操作，以进行跨尺度样式符合匹配。实验结果表明，所提出的半参数图像样式化方法可为具有挑战性的样式模式产生令人鼓舞的结果，从而保留了全球外观和精美的细节。此外，通过控制推理阶段的边缘数量，提出的方法还触发了新的功能，例如使用单个模型的多元化基于斑块的风格化。

translated by 谷歌翻译

Safe Distillation Box

Jingwen Ye , Yining Mao , Jie Song , Xinchao Wang , Cheng Jin , Mingli Song

分类：人工智能 | 计算机视觉 | 机器学习

2021-12-05

知识蒸馏（KD）最近被出现为将学生预先接受教师模型转移到轻量级学生的知识的强大战略，并在广泛的应用方面表现出了前所未有的成功。尽管结果令人鼓舞的结果，但KD流程本身对网络所有权保护构成了潜在的威胁，因为网络中包含的知识可以毫不费力地蒸馏，因此暴露于恶意用户。在本文中，我们提出了一种新颖的框架，称为安全蒸馏盒（SDB），允许我们将预先训练的模型包装在虚拟盒中用于知识产权保护。具体地，SDB将包装模型的推理能力保留给所有用户，但从未经授权的用户中排除KD。另一方面，对于授权用户，SDB执行知识增强方案，以加强KD性能和学生模型的结果。换句话说，所有用户都可以在SDB中使用模型进行推断，但只有授权用户只能从模型中访问KD。所提出的SDB对模型架构不对限制，并且可以易于作为即插即用解决方案，以保护预先训练的网络的所有权。各个数据集和架构的实验表明，对于SDB，未经授权的KD的性能显着下降，而授权的销量会增强，展示SDB的有效性。

translated by 谷歌翻译

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

Sirui Zhao , Huaying Tang , Xinglong Mao , Shifeng Liu , Hanqing Tao , Hao Wang , Tong Xu , Enhong Chen

分类：计算机视觉

2023-01-03

As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.

translated by 谷歌翻译

Optimization of Image Transmission in a Cooperative Semantic Communication Networks

Wenjing Zhang , Yining Wang , Mingzhe Chen , Tao Luo , Dusit Niyato

分类：人工智能 | 计算机视觉

2023-01-01

In this paper, a semantic communication framework for image transmission is developed. In the investigated framework, a set of servers cooperatively transmit images to a set of users utilizing semantic communication techniques. To evaluate the performance of studied semantic communication system, a multimodal metric is proposed to measure the correlation between the extracted semantic information and the original image. To meet the ISS requirement of each user, each server must jointly determine the semantic information to be transmitted and the resource blocks (RBs) used for semantic information transmission. We formulate this problem as an optimization problem aiming to minimize each server's transmission latency while reaching the ISS requirement. To solve this problem, a value decomposition based entropy-maximized multi-agent reinforcement learning (RL) is proposed, which enables servers to coordinate for training and execute RB allocation in a distributed manner to approach to a globally optimal performance with less training iterations. Compared to traditional multi-agent RL, the proposed RL improves the valuable action exploration of servers and the probability of finding a globally optimal RB allocation policy based on local observation. Simulation results show that the proposed algorithm can reduce the transmission delay by up to 16.1% compared to traditional multi-agent RL.

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译

Transformer in Transformer as Backbone for Deep Reinforcement Learning

Hangyu Mao , Rui Zhao , Hao Chen , Jianye Hao , Yiqun Chen , Dong Li , Junge Zhang , Zhen Xiao

分类：机器学习 | 人工智能 | 机器人

2022-12-30

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings, consistently.

translated by 谷歌翻译

STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension

Borui Wang , Chengcheng Feng , Arjun Nair , Madelyn Mao , Jai Desai , Asli Celikyilmaz , Haoran Li , Yashar Mehdad , Dragomir Radev

分类：自然语言处理

2022-12-24

Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing, but no previous work has explored the possibility of whether abstractive dialogue summarization can also be used as a means to boost an NLP system's performance on other important dialogue comprehension tasks. In this paper, we propose a novel type of dialogue summarization task - STRUctured DiaLoguE Summarization - that can help pre-trained language models to better understand dialogues and improve their performance on important dialogue comprehension tasks. We further collect human annotations of STRUDEL summaries over 400 dialogues and introduce a new STRUDEL dialogue comprehension modeling framework that integrates STRUDEL into a graph-neural-network-based dialogue reasoning module over transformer encoder language models to improve their dialogue comprehension abilities. In our empirical experiments on two important downstream dialogue comprehension tasks - dialogue question answering and dialogue response prediction - we show that our STRUDEL dialogue comprehension model can significantly improve the dialogue comprehension performance of transformer encoder language models.

translated by 谷歌翻译

Generation-Augmented Query Expansion For Code Retrieval

Dong Li , Yelong Shen , Ruoming Jin , Yi Mao , Kuan Wang , Weizhu Chen

分类：人工智能 | 自然语言处理

2022-12-20

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing the documentation code pairs by embedding them into latent space, without the association of external knowledge. In this paper, we propose a generation-augmented query expansion framework. Inspired by the human retrieval process - sketching an answer before searching, in this work, we utilize the powerful code generation model to benefit the code retrieval task. Specifically, we demonstrate that rather than merely retrieving the target code snippet according to the documentation query, it would be helpful to augment the documentation query with its generation counterpart - generated code snippets from the code generation model. To the best of our knowledge, this is the first attempt that leverages the code generation model to enhance the code retrieval task. We achieve new state-of-the-art results on the CodeSearchNet benchmark and surpass the baselines significantly.

translated by 谷歌翻译

Pay Attention to Your Tone: Introducing a New Dataset for Polite Language Rewrite

Xun Wang , Tao Ge , Allen Mao , Yuki Li , Furu Wei , Si-Qing Chen

分类：自然语言处理

2022-12-20

We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemistically and politely, which is more challenging -- not only for NLP models but also for human annotators to rewrite with effort. To alleviate the human effort for efficient annotation, we first propose a novel annotation paradigm by a collaboration of human annotators and GPT-3.5 to annotate \textsc{PoliteRewrite}. The released dataset has 10K polite sentence rewrites annotated collaboratively by GPT-3.5 and human, which can be used as gold standard for training, validation and test; and 100K high-quality polite sentence rewrites by GPT-3.5 without human review. We wish this work (The dataset (10K+100K) will be released soon) could contribute to the research on more challenging sentence rewrite, and provoke more thought in future on resource annotation paradigm with the help of the large-scaled pretrained models.

translated by 谷歌翻译

StyleFlow: Disentangle Latent Representations via Normalizing Flow for Unsupervised Text Style Transfer

Kangchen Zhu , Zhiliang Tian , Ruifeng Luo , Xiaoguang Mao

分类：自然语言处理

2022-12-19

Text style transfer aims to alter the style of a sentence while preserving its content. Due to the lack of parallel corpora, most recent work focuses on unsupervised methods and often uses cycle construction to train models. Since cycle construction helps to improve the style transfer ability of the model by rebuilding transferred sentences back to original-style sentences, it brings about a content loss in unsupervised text style transfer tasks. In this paper, we propose a novel disentanglement-based style transfer model StyleFlow to enhance content preservation. Instead of the typical encoder-decoder scheme, StyleFlow can not only conduct the forward process to obtain the output, but also infer to the input through the output. We design an attention-aware coupling layers to disentangle the content representations and the style representations of a sentence. Besides, we propose a data augmentation method based on Normalizing Flow to improve the robustness of the model. Experiment results demonstrate that our model preserves content effectively and achieves the state-of-the-art performance on the most metrics.

translated by 谷歌翻译